Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers

نویسندگان

Stefania Degaetano-Ortlieb

Peter Fankhauser

Hannah Kermes

Ekaterina Lapshinova-Koltunski

Noam Ordan

Elke Teich

چکیده

We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (SCITEX) which covers a time range of roughly thirty years (1970/80s to early 2000s) (Degaetano-Ortlieb et al., 2013; Teich and Fankhauser, 2010). In particular, we investigate the diversification of scientific registers over time. Our theoretical basis is Systemic Functional Linguistics (SFL) and its specific incarnation of register theory (Halliday and Hasan, 1985). In terms of methods, we combine corpus-based methods of feature extraction and data mining techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scientific registers and disciplinary diversification: a comparable corpus approach

We present a study on linguistic contrast and commonality in English scientific discourse on the basis of a monolingually comparable corpus. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (SCITEX) which covers a time ran...

متن کامل

Feature Discovery for Diachronic Register Analysis: a Semi-Automatic Approach

In this paper, we present corpus-based procedures to semi-automatically discover features relevant for the study of recent language change in scientific registers. First, linguistic features potentially adherent to recent language change are extracted from the SciTex Corpus. Second, features are assessed for their relevance for the study of recent language change in scientific registers by mean...

متن کامل

Evaluation of the nutritional effects of fasting on cardiovascular diseases, using fuzzy data mining

Background: Advances in information technology and data collection methods have enabled high-speed collection and storage of huge amounts of data. Data mining can be used to derive laws from large data volumes and their characteristics. Similarly, fuzzy logic by facilitating the understanding of events is considered a suitable complement to scientific data mining. Materials and Methods: The pre...

متن کامل

Cross-Linguistic Transfer or Target Language Proficiency: Writing Performance of Trilinguals vs. Bilinguals in Relation to the Interdependence Hypothesis

This study explored the nature of transfer among bilingual vs. trilinguals with varying levels of competence in English and their previous languages. The hypotheses were tested in writing tasks designed for 75 high (N= 35) vs. intermediate (N=40) proficient EFL learners with Turkish, Persian, English and Persian, English linguistic backgrounds. Qualitative data were also collected through some ...

متن کامل

Text-Mining: Application Development Challenges

This paper reviews the best practices and challenges for project managers and developers involved in implementing text-mining applications. With focus on rule-based information extraction, and references to actual cases, the authors share their experiences from developing several text-mining applications in diverse industries. First, project management issues are discussed, including a process ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers

نویسندگان

چکیده

منابع مشابه

Scientific registers and disciplinary diversification: a comparable corpus approach

Feature Discovery for Diachronic Register Analysis: a Semi-Automatic Approach

Evaluation of the nutritional effects of fasting on cardiovascular diseases, using fuzzy data mining

Cross-Linguistic Transfer or Target Language Proficiency: Writing Performance of Trilinguals vs. Bilinguals in Relation to the Interdependence Hypothesis

Text-Mining: Application Development Challenges

عنوان ژورنال:

اشتراک گذاری